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(57) ABSTRACT 

Aproximity-oriented redirection system for service-to-client 
attachment in a virtual overlay distribution network. The 
virtual overlay distribution network includes addressable 
routers for routing packet traffic, wherein a packet of data is 
routed from a source node to a destination node based on 
address fields of the packet. The invention includes a redi- 
rector coupled to at least one of the addressable routers and 
includes: logic for accepting a service request from a client; 
logic for determining a selected server for handling the 
service request, the selected server being one of a plurality 
of servers that can handle the service request; and logic for 
generating a redirection message directed to the client for 
redirecting the service request to the selected server. 
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PROXIMITY-BASED REDIRECTION balance the load and save network bandwidth. These repli- 

SYSTEM FOR ROBUST AND SCALABLE cated servers may have some or all of the content contained 

SERVICE-NODE LOCATION IN AN at the origin server and many variations exist for how a 

INTERNETWORK particular arrangement of servers are deployed, how content 

5 is distributed to them from the master server, and how clients 

CROSS-REFERENCES TO RELATED are assigned to the appropriate server. 

APPLICATIONS Much of the technology that has been developed to 
•mis application claims priority from a co-pending U.S. su PP ort these types of server replication and caching tech- 
Provisional Patent Application No. 60/152,257 filed Sep. 3, in ? olo S lcs 15 \ ad hoc a ° d ^congruent with the underlying 
1999. This application is related to U.S. patent application 10 Internet "chitecture For example, common techniques for 
Ser. No. 09/323,869 entitled "PERFORMING MUCH- transparent caching break the semantics °f TCP and are thus 
CAST COMMUNICATION IN COMPUTER NETWORKS ^compatible with certain modes of the underlying IP packet 
BY USING OVERLAY ROUTING" filed on Jun. 1, 1999 service ^ multI P ath rouUD S- This leads to a number of 
and to U.S. Provisional Patent Application No. 60/115,454 „ management problems and, in particular, does not 
entitled "SYSTEM FOR PROVIDING APPLICATION- 15 P r0Vlde a cohesive netwo * architecture that can be man- 
LEVEL FEATURES TO MULTICAST ROUTING IN aged in a sensible fashion from a network operations center. 
COMPUTER NETWORKS" filed on Jan. 11, 1999. The A similar content distribution problem involves the del iv- 
disclosures of each of the above identified applications are erv of live streaming media to many users across the 
incorporated in their entirety herein by reference for all Internet. Here, a server produces a live broadcast feed and 
purposes. clients connect to the server using streaming media transport 

protocols to receive the broadcast. However, as more and 

FIELD OF THE INVENTION more clients tune in to the broadcast, the server and network 

becomes overwhelmed by the task of delivering a large 

This invention relates generally to the field of data number of packet streams t0 a large number of c]ients 

networks, and more particularly, to the distribution of infor- 25 . 4 . 4 lL . r . , . 4 , 

i > f , J One solution to this live broadcast problem is to leverage 

mation on a data network. . - . , , i*- * Tn ^ l4 . . 

the efficiency of network layer multicast, or IP Multicast as 

BACKGROUND OF THE INVENTION defined in the Internet architecture. In this approach, a server 

transmits a single stream of packets to a "multicast group" 

One of the pivotal challenges in scaling the Internet 3Q rather than sending a separate copy of the stream to each 

infrastructure for mass adoption is the problem of distrib- individual client. In turn, receivers interested in the stream 

uting arbitrary content from a sourcing site to many users in in question "tune in" to the broadcast by subscribing to the 

the Internet in an efficient, viable, and cost effective fashion. multicast group (e.g., by signaling to the nearest router the 

The dissemination of popular news articles, video subscription information using the Internet Group Manage - 

broadcasts, stock quotes, new releases of popular software, 35 ment Protocol, IGMP). The network efficiently delivers the 

and so forth all can result in the so-called flash effect, where broadcast to each receiver by copying packets only at fan out 

large numbers of users spread across the network all try to points in the distribution path from the source to all receiv- 

retrieve the same content from the same server at roughly the ers. Thus, only one copy of each packet appears on any 

same time. Not only does a traffic flash bring a server to its physical link. 

knees, but it also wastes network bandwidth because many 4Q Unfortunately, a wide variety of deployment and scalabil- 
redundant copies of the same content flow across the wide- j t y problems have confounded the acceptance and prolifera- 
area network. For example, a breaking news event on t ion of IP Multicast in the global Internet. Many of these 
CNN's web site could cause millions of users to fetch the problems follow fundamentally from the fact that computing 
article's text off their server. Likewise, the premiere run of a multicast distribution tree requires that all routers in the 
a high-visibility movie broadcast over the Internet could 45 network have a uniformly consistent view of what that tree 
similarly encourage millions of users to attempt to access the \ 00 ^ \& Ct \ a multicast, each router must have the correct 
media content server. local view of a single, globally consistent multicast routing 
Two key mechanisms for the Web have been proposed to tree. If routers have disparate views of a given multicast tree 
overcome the problems induced by the flash effect, namely, in different parts of the network, then routing loops and 
caching and server replication. In caching, a cache is situ- 50 black holes are inevitable. A number of other problems — 
ated at a strategic location within the network infrastructure e.g., multicast address allocation, multicast congestion 
to intercept content requests from the clients. When the control, reliable delivery for multicast, etc. — have also 
cache receives a content request, it consults its store of plagued the deployment and acceptance of IP Multicast, 
content and if the requested data is present, the cache serves Despite substantial strides in the last couple of years toward 
the request locally. Otherwise, the request is relayed to the 55 commercial deployment of multicast, the resulting infra- 
origin server and the response is relayed back to the client. structure is still relatively fragile and its reach is extremely 
During this process the cache stores the response in its local limited. 

store. Many strategies have been proposed for managing the Ia addition to the substantial technical barriers to the 
local store, e.g., deciding when to discard an object from the deployment of a ubiquitous Internet multicast service, there 
cache, when to refresh an object that may be different from 60 are business and economic barriers as well. Internet service 
the server, and so forth. Caches may be non-transparent, in providers have not had much success at offering wide-area 
which the client is explicitly configured with the cachets multicast services because managing, monitoring, and pro- 
network address, or transparent, in which the client is visioning for multicast traffic is quite difficult. Moreover, it 
ignorant of the cache and the cache intercepts the content ^ difficult to control who in a multicast session can generate 
request transparently, e.g., using a layeM switch. 6S traffic and t0 what parts of tne net work that traffic is allowed 
In server replication, servers are deployed across the wide to reach. Because of these barriers, a multicast service that 
area and clients are assigned to these distributed servers to reaches the better part of the Internet is unlikely to ever 
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emerge. Even if ii does emerge, the process will undoubtedly BRIEF DESCRIPTION OF THE DRAWINGS 
take many years to unfold. 

To avoid the pitfalls of multicast, others have proposed . ^ G 1 shows » exam P le of 'VPical components and 

that the streaming-media broadcasts be enabled by an ^coonect.ons that compnse a porUon of Internet connec 

application-level solution called a splitter network. In this 5 V1 ^ 7 

approach, a set of servers distributed across the network are RG * 2 illustrates a typical Overlay ISP model; 

placed at strategic locations within the service providers' FIG. 3 shows a network portion 300 used to implement an 

networks. These servers are provided with a "splitting" anycast routing scheme in accordance with the present 

capability, which allows them to replicate a given stream to invention; 

a number of downstream servers. With this capability, 10 FIG. 4 shows a master and affiliate networks configured to 

servers can be arranged into a tree-like hierarchy, where the accomplish interdomain anycast routing; 

root server sources a stream to a number of downstream FIG. 5 shows how control and service functions included 

servers which in turn split the stream into a number of ^ me t mvention are separated within a particular 

copies that are forwarded to yet another tier of downstream j^p. 

servers. 15 * _ , , . 

n r . i i -> t , c t j FIG. 6 shows a portion of a data network constructed in 

Unfortunately, a splitter network of servers is plagued A „ ... 4 , r . . . . , , 

... t c li c- * *u * f • accordance with the present invention to implement active 

with a number of problems. First, the tree of splitters is sess i on failover* 

statically configured, which means that if a single splitter ' 

fails, the entire sub-tree below the point of failure loses FIG * 7 shows a P 0 ** 011 of a data network constructed in 

service. Second, the splitter network must be oriented 20 accordance with the present invention to implement wide 

toward a single broadcast center, requiring separate splitter area overr l ow ; 

networks composed of distinct physical servers to be main- FIG. 8 illustrates the use of IP Multicast in accordance 

tained for each broadcast network. Third, since the splitter with the present invention; and 

abstraction is based on an extension of a media server, it is FIG. 9 shows an embodiment of the present invention 

necessarily platform dependent, e.g., a RealNetworks-based 25 adapted for registering and connecting service installations 

splitter network cannot distribute Microsoft Netshow traffic. to the service broadcast network infrastructure. 
Fourth, splitter networks are highly bandwidth inefficient 

since they do not track receiver interest and prune traffic DESCRIPTION OF THE SPECIFIC 

from sub-trees of the splitter network that have no down- EMBODIMENTS 

stream receivers. Finally, splitter networks provide weak 30 . 

policy controls-the aggregate bit rate consumed along a . ^ comprehensive redirection system of the present 

path between two splitter nodes cannot be controlled and operates in tandem with service nodes s.tuated at 

allocated to different classes of flows in a stream-aware strategic locations throughout the network infrastructure that 

fashion are coor " inate ° across a wide area into a cohesive, 

3S coordinated, and managed virtual overlay network. The 

SUMMARY OF THE INVENTION overlay network architecture is based on a design philoso- 

To address the wide variety of problems outlined above, P nv similar to that of the underlying Internet architecture, 

one embodiment of the present invention provides a com- e -g-> il exploits scalable addressing, adaptive routing, hier- 

prehensive redirection system for content distribution in a archical naming, decentralized administration, and so forth, 

virtual overlay broadcast network (OBN). In this system, 4 q B 5 ca ^e of this, the overlay architecture enjoys the same 

service nodes are situated at strategic locations throughout hi S n degree of robustness, scalability, and manageability 

the network infrastructure, but unlike previous systems, evident in the Internet itself . Unlike a physical internetwork, 

these service nodes are coordinated across the wide area into where routers are directly attached to each other over 

a cohesive, coordinated, and managed virtual overlay net- physical links, service nodes in the virtual overlay network 

work. Service node clusters peer with each other across IP 45 communicate with each other using the packet service 

tunnels, exchanging routing information, client subscription provided by the underlying IP network. As such, the virtual 

data, configuration controls, bandwidth provisioning cap a- overlay is highly scalable since large regions of a network 

bilities and so forth. At the same time, the service nodes are ( e S > an enlire ISP ' S backbone) composed of a vast number 

capable of processing application-specific requests for of individual components (like routers, switches, and links) 

content, e.g., they might appear as a. Web server or a 50 mi S Dt ret 3 uire 0Q ly a smal1 number of service nodes to 

streaming-media server depending on the nature of the provide excellent content-distribution performance, 

supported service. In short, a service node has a hybrid role: Another important aspect of the present invention is the 

it functions both as a server as well as an application-level 'glue' interface between clients that desire to receive infor- 

content router. mation content and the service nodes that actually deliver it. 

In an embodiment of the present invention, an improve- 55 That is, the 'glue* interface is a mechanism by which the 

ment to a packet-switched network is provided. The packet- client can attach to a service node, request a particular piece 

switched network includes addressable routers for routing of content, and have that content delivered efficiently. This 

packet traffic, wherein a packet of data is routed from a is sometimes referred to as the service rendezvous problem, 

source node to a destination node based on address fields of Fundamentally, service rendezvous entails a system by 

the packet. The improvement comprises a redirector coupled 60 which it is possible to: (1) publish a single name for a 

to at least one of the addressable routers and includes logic service; (2) replicate the service throughout the network; and 

for accepting a service request from a client, logic for (3) have each client that desires the service receive it from 

determining a selected server for handling the service the most appropriate server. To scale to millions of clients, 

request, the selected server being one of a plurality of the service rendezvous mechanism must efficiently distrib- 

servers that can handle the service request, and logic for 65 ute and load-balance client requests to the service nodes 

generating a redirection message directed to the client for spread across the wide area. Moreover, to efficiently utilize 

redirecting the service request to the selected server. network bandwidth, content should flow over the minimum 
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number of network links to reach the requesting client. Both 
these points argue that clients should be directed to a nearby 
service node capable of serving the request. If there is no 
nearby service node capable of servicing the request, the 
system should be able to redirect the client to a service node 5 
elsewhere in the network across the wide area to service the 
request. Furthermore, it should be possible to cluster service 
nodes at a particular location and have the clients connect to 
individual nodes within a cluster based on traffic load 
conditions. In short, the service rendezvous system should 10 
provide a mechanism for server selection and should utilize 
redirection to effectuate load balancing to achieve the 
desired result. In cases where a local cluster becomes 
overloaded, the server selection should compensate to load 
balance across the wide area. 15 

Unfortunately, service rendezvous is a difficult problem, 
because the Internet architecture deliberately hides the 
underlying structure of the network to impose flexibility and 
robustness upon higher-layer protocols, so it is difficult to 
discover and use a selected server for a particular network 20 
transaction. To overcome these problems, the rendezvous 
service described herein exploits "anycast" routing, a 
network-level mechanism that can be used to route user 
requests to nearby service nodes based on topological local- 
ity. 

The concept of anycast packet forwarding is well known 
in network research literature; yet the concept remains 
narrowly applied in practice because of compatibility issues 
with existing packet forwarding networks. In general, opera- 
tion of the Internet conforms to consensually agreed upon 
standards. The standards are set out in documents referred to 
as "requests for comments" (RFC). The RFCs applicable to 
Internet operation include RFC-1546 and others. 

At the highest level, there are two primary approaches to 
implementing anycast packet forwarding. The first approach 
is to introduce a special type of anycast address and to create 
new routing protocols and service interfaces that are "any- 
cast aware/' This obviously would entail the lengthy process 
of standardization, adoption by router vendors and so forth. 
The second approach is to reuse a piece of the existing 
unicast address space. However, this second approach has 
two corresponding technical challenges which to this point 
have been unresolved. The challenges are that of: (1) sup- 
porting stateful transport protocols; and (2) supporting inter- 
domain anycast routing and route aggregation. Fortunately, 
embodiments of the present invention offer novel solutions 
to these technical challenges. For example, a solution to the 
problem of supporting stateful transport protocols is pro- 
vided in a section of this document entitled "Stateful Any- 
casting/' A solution to the problem of supporting inter- 
domain anycast routing and route aggregation is provided in 
a section of this document entitled "Interdomain Anycast 
Routing/' 

The rendezvous service described herein assumes that the 
underlying packet forwarding is not "anycast aware/' How- 
ever a system based upon "anycast aware" network is viable 
as well. Anycast packet forwarding is used to forward 
packets from a client to the nearest instance of the rendez- 
vous service. 60 

Instead, one embodiment of the present invention simpli- 
fies the anycast service model from a fully dynamic frame- 
work (where hosts can join and leave anycast groups 
dynamically) to a statically provisioned framework (where 
only specially configured hosts within the network infra- 65 
structure are members of an anycast group). In this statically 
provisioned framework, the assignment, allocation, and 
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advertisement of anycast addresses to a central authority and 
associates a large block of anycast addresses with a single, 
well-connected backbone network. The backbone network 
can be referred to as the content backbone (CBB). 

Another advantage included in embodiments of the 
present invention is that clients attach to the content distri- 
bution network at explicit, per-client service access points. 
This allows the infrastructure to perform user-specific 
authentication, monitoring, customization, advertising, and 
so forth. In contrast, an approach based on pure multicast, 
albeit scalable, provides none of these features since the 
multicast receiver subscription process is completely anony- 
mous. 

In summary, a virtual overlay network built using anycast - 
based service rendezvous enjoys the following attractive 
properties: 

The service access mechanism is highly scalable, since 
the closest service node is discovered using anycast, 
which can be accomplished with standard routing pro- 
tocols deployed in novel configurations; 
The system offers substantial bandwidth savings, since 
requests may be routed to the nearest service node, 
thereby minimizing the number of network links that 
content must flow across; 
The service infrastructure provides fine-grained control, 

monitoring, and customization of client connections; 
The administration and configuration of the infrastructure 
is highly decentralized, facilitating large-scale deploy- 
ment across heterogeneous environments managed by a 
diverse range of administrative entities; 
The system affords very high availability and robustness 
when anycast is built on standard adaptive routing 
protocols and the service elements are clustered for 
redundancy, thus ensuring that requests are routed only 
to servers that are properly functioning and advertising 
their availability; and 
The content broadcast network is incrementally 
deploy able, since the anycast-based redirection service 
can first be built into the content broadcast backbone 
and then built out into affiliate ISPs on an individual 
basis to track growing user demand. 
In the following sections of this document, the details of 
the architectural model and embodiments of various system 
components used to implement the anycast-based redirec- 
tion system for the virtual overlay broadcast network 
included in the present invention are described. 
Network Architecture 

FIG. 1 shows an example of typical components and 
interconnections that comprise a portion of the Internet 100. 
Internet service providers (ISPs) 101, 102 and 103 provide 
Internet access. A typical ISP operates an IP-based network 
across a wide area to connect individual customer networks 
104 and/or individual users to the network via access 
devices 106 (e.g., DSL, telephone modems, cable modems, 
etc.). The typical ISP also peers with other ISPs via 
exchange points 108 so that data traffic can flow from a user 
of one ISP to a user of another ISP. A collection of internal 
IP routers 110 interconnected with communications links 
111 provide connectivity among users within an ISP. Spe- 
cialized border routers 112 situated at the exchange points 
forward non-local traffic into and out of the ISP. Often, an 
individual ISP network, such as ISP 103, is called an 
autonomous system (AS) because it represents an indepen- 
dent and aggrega table unit in terms of network routing 
protocols. Within an ISP, intradomain routing protocols run 
(e.g., RIP or OSPF), and across ISPs, interdomain routing 
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protocols run (e.g., BGP). The term "intradomain protocol" 
is often used interchangeably with the term interior gateway 
protocol (IGP). 

As the Internet and World Wide Web (Web) have grown, 
ISPs realized that better end-to-end network service perfor- 5 
mance could be attained by combining two innovative 
architectural concepts in concert, namely: (1) aggressively 
peering with a large number of adjacent ISPs at each 
exchange point; and (2) co-locating data centers containing 
application services (e.g., Web servers) near these exchange 
points. The "co-location facility" (colo) at each peering 
point thus allows application services to be replicated at 
each peering point so that users almost anywhere in the 
network enjoy high-speed connections to the nearby service. 

FIG. 2 illustrates a typical architecture for an Overlay ISP 
200, since a service network so built, forms an overlay 15 
structure across a large number of existing ISPs, for 
example, ISPs 202, 204 and 206. The Overlay ISP 200 
couples to the existing ISPs via routers 208 and further 
couples to data centers (DC) 210. Overlay ISPs rent machine 
space and network bandwidth to content providers that place 20 
their servers in colo's located at the DCs. 

To summarize, a natural building block for the CBB is the 
ISP colo. In embodiments of the present invention, service 
nodes are housed in colo's and arranged into an overlay 
structure across the wide area using the available network 25 
connectivity. However, service nodes need not be situated in 
the specialized colo sites, and in fact, can exist in any part 
of the network. The colos are a convenient and viable 
deployment channel for the service nodes. 
Interdomain Anycast Routing 30 

FIG. 3 shows a network 300 configured to implement 
anycast routing in accordance with the present invention. 
The network 300 comprises routers (R1-R6), two server 
devices Si and S 2 , and two clients C a , and C^. In one 
embodiment of network 300, both server devices advertise 35 
reachability to the address block "A/24 " (i.e., A is a 24-bit 
prefix for a 32-bit IPv4 address) via IGP. Thus, the two 
server devices utilize the routing advertisements to reflect 
server availability into the infrastructure of the network 300. 
Routers R 4 and R3 are configured to listen to these reach- 40 
ability advertisements on their attached LANs 302 and 304, 
respectively. As a result of the IGP computation, the routers 
R1-R6 in the network learn the shortest-path from each 
client to the servers via addresses that fall within the "A" 
prefix. Thus, if client C 2 sends a packet to address A ± (where 45 
the prefix of A 2 is A), then router R 2 will forward it to router 
R 4 , which in turn forwards it to server S 2 , as shown at path 
310. Similarly, packets sent to A a from client C 1 are routed 
to server S 19 as shown at path 312. If server S 2 fails, then 
advertisements from S 2 for A/24 will cease and the network 50 
will re-compute the corresponding shortest-path routes to 
A/24. Consequently, packets sent from C 2 to A^ are routed 
to server S l9 since there is no other node advertising such a 
route, as shown at path 314. 

One of the problems posed by the anycast routing scheme 55 
described above is how anycast routes are propagated across 
the wide area to arbitrary sites thai might not be configured 
with anycast -based service nodes. Rather than require a new 
infrastructure for anycast routing, embodiments of the 
present invention simply leverages the existing interdomain 60 
routing system with a framework in which a single AS 
"owns" a given anycast address block and advertises it using 
its normal interdomain protocol, i.e., BGP. Then, other 
independent AS's can be incrementally configured with 
anycast- aware service nodes, such that the IGP for those 65 
AS's routes packets sent to the anycast address block in 
question to the service nodes within that single AS. 



To do this, the content backbone (CBB) is situated at the 
"master" AS, which owns the anycast address block and 
advertises it to the Internet using BGP. That is, an ISP carves 
out a block of its pre-existing but unused address space (or 
requests new addresses from the Internet Assigned Numbers 
Authority) and assigns this block to the CBB, which declares 
that this block of addresses is to be used for nothing but 
anycast routing. The master AS advertises the anycast 
block — call this block "A" — across the wide area, again 
using BGP as if it were a normal IP network. Thus, in the 
configuration described so far, any packet sent to an address 
in block A from anywhere in the Internet is routed to the 
master AS. 

To provide the services that underlie the anycast routing 
infrastructure, the CBB deploys service nodes in the master 
AS and arranges for those nodes to advertise reachability for 
A using the master AS's IGP. Once this piece is in place, 
when a packet enters the master AS (from anywhere on the 
Internet), it is routed to the CBB service node that is closest 
to the border router traversed by the packet upon entering the 
master AS. Assuming the master AS is densely peered, then 
most users on the Internet will enjoy a low-delay, high-speed 
path to a service node within the master AS (CBB). 

Though the architecture described thus far provides a 
viable mechanism for proximity-based service-node loca- 
tion for nodes that are situated within the master AS, the 
system is limited by the fact that all service nodes reside in 
that master AS. A more scalable approach would allow 
service nodes to be installed in other ISP's networks. To do 
so, an affiliate AS — that is, an ISP that supports the rendez- 
vous service but is not the master AS, simply installs service 
nodes in exactly the same fashion as the master AS. 
However, the affiliate advertises the anycast block only 
within its domain using its IGP; it does not advertise the 
anycast block outside its domain to its peers. In another 
embodiment, an extension to this scheme is provided in 
which multiple AS's do advertise the anycast block in BGP 
(i.e. their exterior routing protocol). That extension is 
described in another section of this document. 

FIG. 4 shows a master AS 400 and affiliate networks 402, 
404 and 406 configured to accomplish interdomain anycast 
routing. The master AS 400 comprises anycast -based service 
nodes A l3 Aa, and A 3 , and couples to the three affiliate 
networks via routers 408. Four clients C 1( C 2) C 3 , and C 4 
attach to the affiliates as shown. Affiliates 402 and 406 have 
no service nodes deployed therein, while affiliate 404 has a 
single service node A4 configured into its infrastructure. 
Thus, given the normal behavior of unicast inter- and 
intra-domain routing protocols, packets sent to block A from 
Cj are sent to A 1} as shown by path 410, while packets sent 
to block A from C 2 are routed to A^, as shown by path 412. 
The paths 410 and 412 represent the shortest interdomain 
paths from the affiliate 402 to the master AS 400. In contrast, 
packets sent to block A from client C 3 are routed to service 
node A^ as shown by path 414. This occurs since the IGP 
in the affiliate 404 will cause the service node A 4 to advertise 
reachability to block A and thus "hijack" packets sent to that 
address. Likewise, packets sent to block A from C 4 will also 
be "hijacked" by A^, as shown by path 416, since the path 
from the affiliate 406 to the master AS 400 traverses the 
affiliate 404. This is a deliberate and desirable feature of the 
architecture in accordance with the present invention, since 
it scales and distributes the load of the system without 
requiring anycast intelligence to be deployed everywhere for 
correct operation. 

Although the anycast addressing and routing architecture 
provides a framework for scalable service rendezvous, own- 
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ership of the an yeast address space is preferably centralized by virtue of proximity-based anycast routing and the caching 

at the CBB and/or master AS. While this limits the overall and hierarchy that are built into the DNS. 

flexibility of the solution to a degree, it has the benefit of Stateful Anycasting 

centralizing the management of the address space. In this One of the difficulties in implementing an anycast service 

model, when content providers sign up with the CBB, they 5 on top of the IP packet service is the dynamic nature of the 

are assigned an exclusive anycast address space from the underlying routing infrastructure. Because IP allows packets 

CBB's block of available addresses. In turn, the content to be duplicated and routed along different paths (among 

providers use this anycast address space in references to other things), packets sent using the anycast service, may be 

their services, e.g., as the host part of a uniform resource delivered to multiple anycast service nodes simultaneously 

locator (URL). Thus, users that click on such Web links are 10 or consecutive packets may be delivered to one service node, 

directed to the closest service node in the CBB or its and then another, intermittently. 

affiliates. This is especially problematic for transport-layer pro to- 
Naming and Service Discovery cols like TCP, which assume that the end points of the 

Once a service node receives an anycast request for communication channel are fixed. As an example, consider 

service, the service must be instantiated on behalf of the 15 a TCP connection to a service node via an anycast address, 

requesting client. That is, the service request must be satis- Suppose half way through the connection, the anycast route 

fied locally (if an extension of the master service is locally changes so that the client's packets are suddenly routed to a 

available), or it must be initiated from the master service different service node. However, that new service node has 

site. One method to locate the service at the master site is to no knowledge of the existing TCP connection, so it sends a 

iteratively apply the anycast routing architecture from 20 "connection reset" back to the client. This breaks the 

above. Yet, an attempt to send an anycast packet to the connection, which may result in a disruption of the service 

anycast address in question will fail because the packet will that the client was invoking. The crux of the problem is that 

be routed back to the host it came from. In other words, the TCP connections are stateful while IP is stateless, 

anycast packet is trapped in the domain that received it. A fair amount of research has dealt with this problem, but 

Thus, the system must rely upon some other mechanism for 25 none of the research has produced a solution adequate for 

communicating between the remote service node and the use with the present invention. It may be possible to change 

master service site. the TCP protocol in a way that would circumvent this 

In one embodiment the service node queries some data- problem. But changing the entire installed base of millions 

base to map the anycast address back to the master service of deployed TCP stacks in the Internet is next to impossible, 

site, or even to a set of sub-services that are related to the 30 Other approaches have advocated methods where routers pin 

service being offered. Fortunately, a distributed database to down state within the network to ensure that an anycast TCP 

perform this type of mapping in a highly scalable and robust connection remains on its original path. This is impractical 

fashion already exists. The Domain Name System (DNS), as well because it involves upgrading all routers in the 

which handles IP host name-to-address mappings in the Internet infrastructure and the work is still very much in the 

Internet at large, can be easily reused and configured for this 35 research stage. 

purpose. More specifically, RFC- 2052 defines a scheme for In an embodiment of the present invention, a novel 

defining arbitrary service entries using the DNS service scheme called stateful anycasting is employed. In this 

(SRV) resource records. By translating the numeric anycast approach, the client uses anycast only as part of a redirection 

address into a DNS domain name according to some well- service, which by definition, is a short-lived ephemeral 

defined, deterministic algorithm, a service node can deter- 40 transaction. That is, the client contacts an anycast referral 

mine the location of services using DNS queries keyed by node via the anycast service, and the referral node redirects 

this anycast name. The required DNS configuration may be the client to a normally- addressed and routed (unicast) 

carried out by the CBB, or the CBB may delegate authority service node. Thus, the likelihood that the redirection pro- 

to configure the DNS subdomain for a particular anycast cess fails because the underlying anycast routes are inde- 

block to the original content provider, thereby allowing that 45 terminate is low. If this does occur, the redirection process 

provider to configure and manage the offered services as it can be restarted, either by the client, or depending on 

sees fit. context, by the new service node that has been contacted. If 

An alternative method is to assign only a single anycast the redirection process is designed around a single request 

address to a CBB and embed additional information about and single response, then the client can easily resolve any 

the content originating site in the client URL. That is, 50 inconsistencies that arise from anycasting pathologies, 

anycast routing is used to capture client requests for any If a service transaction is short-lived (e.g., the data can be 

content published through the CBB, while additional infor- transferred in some small number of round-trip times), then 

mation in the URL is used to identify the particular location the need for redirection is limited. That is, short Web 

or other attributes for the content in question. In the remain- connections could be handled in their entirety as a TCP 

der of this disclosure, the former method (wherein multiple 55 anycast connection. On the other hand, long-lived connec- 

anycast addresses are assigned to a CBB) is assumed for tions like streaming media would be susceptible to routing 

illustrative purposes, however, it would be apparent to one changes, but the stateful anycasting would minimize the 

with skill in the art how the system could be simplified so probability that a route change could cause a problem (i.e., 

that only a single anycast address were assigned to each the change would have to occur during the redirection 

CBB. 60 process). Yet if an anycast based infrastructure is widely 

To summarize, the service rendezvous problem is solved deployed, then application vendors will have incentive to 

in a scalable fashion with two interdependent mechanisms: provide support for the anycast service; if so, a client could 

(1) clients bind to the service infrastructure using anycast be modified to transparently re- invoke the anycast service if 

addresses and routing; and (2) service nodes bind to the a routing transient caused any sort of service disruption, 

master service site using auxiliary information conveyed 65 In another embodiment, the adverse effects of routing 

explicitly via client URLs or implicitly through a distributed transients are minimized by carefully engineering the oper- 

directory like DNS. Excellent scaling performance results ating policies of the infrastructure. Thus, a large-scale any- 
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casting infrastructure may be built as described herein where elements need to handle a large number of ephemeral 

dynamic routing changes are fairly infrequent and thus, in requests and quickly redirect the requests, while the service 

practice, the problems induced by the statelessness of IP elements need to handle a sustained load of persistent 

with regard to anycast are minimized. In short, the stateful connections like streaming media. Also, the management 

anycasting method described herein could provide for a 5 requirements for these two device classes are quite different 

highly available, robust, and reliable service-rendezvous as ^ me system's sensitivity to their failure modes. For 

system. example, the control elements must manage the server 

A Proximity-based RedirecUon System resources, so that considerations such as load balancing are 

Given the above described architectural components, this factored into server selection. In this regard, the control 

section describes an embodiment of the present invention for 10 elements are capable of monitoring "server health" to deter- 

an anycast-based redirection service that combines these mine which xrvtrs t0 redirect dients t0 For example> 

components. Some of the components of this design are server healtn is based on various param eters, such as, server 

clearly generalizable to a vanety of useful configurations capacity, loading, anticipated server delays, etc., that may be 

and deployment scenarios and are not limited by the specific monitored or received indirectly by the control elements and 

descriptions herein. Other mechanisms are specifically 15 ^ to make server ^1^^ decisions, 

suited for a particular service like streaming media broadcast p IG 5 shows aD embodiment of the present invention that 

or Web content delivery. demonstrates how control and service functions are sepa- 

The proximity-based redirection system provides a ser- rated a particillar ISP l0 meet me requirements 

vice node attachment facility for an arbitrary content deliv- outlined a5ovc In mis crabodimcnt , a service cluster 502 of 

ery network by: (1) allowing arbitrary application-specific 20 one or more nodes (SN) and one or more anycast 

redirection protocols to be used between the client and the refcrral nodcs (ARN) are situated on a local . arca network 

service; and (2) providing the glue between the redirection segment 504 witmn a ^ 5Q0 The network cnt m 

service, the client, the master service site, and the CBB. couples to a m)o router 506 that in ^ coupks t0 thc rest 

The CBB owns a particular anycast address space rooted of ^ [Sp and/or ^ internet 508 

in the master AS. Each content provider is assigned one or 25 Under this configuralion> a client request 510 from an 

more anycast addresses from the anycast address space. arbitrary host 512 in the Internet 508 is routed to the nearest 

Because arbitrary services can be bound to an anycast ARN 514 using proximity-based anycast routing. The ARN 

address using DNS, only one address is required for each 514 re directs the client (path 516) to a candidate service 

distinct content provider. For illustrative purposes, we will node 518 (path 520) ^ the range of techniques desC ribed 

assume that a canonical content provider's DNS domain is 30 herein ^ model scales to arbitrary client loads 

"acme.com" and the CBB's is "cbb.net". The anycast because the service nodes are clustered, which allows the 

address block assigned to the CCB is 10.1.18/24 and the system to be ^mentally provisioned by increasing the 

address assigned to acme.com is 10.1.18.27. It is further cluster size. In addition, the ARNs themselves can be scaled 

assumed that the content provider (acme.com) generates ^ local i oad . balancing deviccs ^ layer . 4 switc hes. 

Web content, on-demand streaming media content, and live 35 At aQy gi ven timej one of the ARN& ^ designated ^ the 

broadcast content. mastefj for example) 514 while me otners arc desig . 

Hie following sections describe the components that nated as backups 522 ^ designation may change over 

comprise the local architecture (defined within a colo) the {ime age[lts may 5e implemented in individual physi- 

components that comprise a wide area architecture (defined cal or ma y all run within one physical device, 

between and across ISPs), and one specific redirection 40 it will be assumed that in this example, the SN's and ARN's 

algorithm based on these architectures and the general are aUached to a single aetwork ent 504 via a single 

principles that underlie them in accordance with the present network i nte rface, though the system could be easily gen- 

invention. eralized such that these agents and physical devices operate 

The. Local Architecture across mult i ple local-network segments. Each ARN is 

This section describes the arrangement of devices to 45 capablc of ^^1^ TOUiing reachability to the anycast 

support a proximity-based redirection service within a par- addfess space owned 5y the serv ice-node infrastructure, but 

ticular ISP, e.g., inside a colo, and how those devices are only mc masler ^ activcly generatcs advertisements, 

configured and interfaced to external components. Likewise, the ISP's colo router(s) 506 attached to the 

The content delivery architecture decomposes naturally network scgment 504 are ^nhgUTcd to listen to and propa- 

into two interdependent yet separable components: (1) the 50 gate these advert isements. This exchange of routing infor- 

control and redirection facility; and (2) the actual service matkm ^ carrfed Qut by whatever i GP ^ in usc within that 

function. That is, a service is typically invoked by a control ISP , 6tg<> RIPj 0 SPF, etc. In the preceding example a single 

connection that in turn triggers the delivery of the service ^ fe elected master for ^ anycas( addresseSj md the 

across a data connection. Moreover, control connections other ARNs as backups . ^ a i leraa tive embodiment a 

typically are amenable to being redirected to alternate IP 5S maste r is elected for each anycast address. This would allow 

hosts. Thus, the high-level model for the system is as load t0 be ^buted among multiple active ARNs, each 

follows: serving a disjoint set of anycast addresses. The failure of any 

a client initiates a control connection to an anycast ar N would start th e election process for its anycast 

address to request a service; addresses. 

an agent at the termination point for that anycast dialogue 60 There are two key steps to bootstrapping the system: (1) 

redirects the client to a fixed service-node location (i.e., the ARN(s) must discover the existence and addresses of 

addressed by a standard, non-anycast IP address); and service nodes within the SN cluster; and (2) the ARN(s) 

the client attaches to the service through the control must determine which service nodes are available and are 

connection to this fixed location and initiates the ser- not overloaded. One approach is to configure the ARNs with 

vice transfer. 65 an enumeration of the IP addresses of the service nodes in 

The requirements placed on the control and data handling the service cluster. Alternatively, the system could use a 

components are vastly different. For example, the control simple resource discovery protocol based on local- area 
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network multicast, where each service node announces its wide-area architecture will be provided which includes how 

presence on a well-known multicast group and each ARN the individual service -node clusters are coordinated and 

listens to the group to infer the presence of all service nodes. managed across the wide area. There are two main wide- are a 

This latter approach minimizes configuration overhead and components for realizing embodiments of the content over- 

thereby avoids the possibility of human configuration errors. 5 lay network included in the present invention, namely: 

With this multicast -based resource discovery model, a a data service, which involves the routing and manage- 
new device is simply plugged into the network and the ment of vast amounts of data from originating content 
system automatically begins to use it. The technique is as sites to the service nodes in the colos; and 
follows: a solution to the anycast-based service rendezvous 
The ARN(s) subscribe to a well-known multicast group lfl problem, which involves binding the services requested 

Gj. by clients to the originating master site. 

The SN(s) in the service cluster announce their presence How the former problem is solved — that is, how data is 

and optional information like system load by sending reliably and efficiently disseminated across the wide-area to 

messages to group G Jt service nodes is beyond the scope of this disclosure. For 

The ARN(s) monitor these messages and build a database example, content might be carried by a streaming broad- 

of available service nodes, storing and updating the casting network such as described in a pending U.S. Patent 

optional attributes for use in load balancing and so Application Serial No. 60/115,454, entitled "System for 

forth. Providing Application-level Features to Multicast Routing 

Each database entry must be "refreshed" by the corre- in Computer Networks" filed on Jan. 22, 1999. Content 

sponding SN, otherwise it is "timed out" and deleted by mi S ht be carried by a file dissemination protocol based 

theARN(s). 20 on flooding algorithms like the network news transport 

Upon receipt of a new service request, the ARN selects a protocol (NNTP). 

service node from the list of available nodes in the ^ disclosure describes how the anycast-based redirec- 

database and redirects the client to that node. ^on system interfaces with available content delivery sys- 

Note that since all the devices in the service cluster are ! ems - A novel framework is used in which service-specific 

co-located on a single network segment or LAN, the use of 25 interactions are earned out between the ARN, the SN, the 

IP multicast requires no special configuration of routing chent > and potentially the originating service or content site, 

elements outside of, or attached to, the LAN. For example, the client might initiate a Web request to 

Fault Recovery anycast address A, which is routed to the nearest ARN 

The nature of the protocols described thus far were „ advertising reachability to address A, which in turn redirects 

designed to perform automatic fault recovery and thus the client to a selected SN with a simple HTTP redirect 

engender a very high degree of availability for the service. message, or the Web request may be serviced directly from 

The system is robust to both ARN failure as well as SN tne 

failure * Q one embodiment of the anycast-based redirection 
Because the ARN "times out" the SN database entries, „ s y stem > the can "pnme" the SN with application- 
SN's that fail are not used for service requests. Thus, if a specific information that is not capable of being conveyed to 
client reconnects to the service (either transparently to the the SN from aD ^modified existing client. Here, the ARN 
user or with user interaction), the service is restarted on contacts an SN and installs some state Q bound to some port 
another service node. If the ARN keeps persistent state about R ^ P ort P ma y be located by the SN and returned to the 
the client, then the system can potentially resume the old ^ ^ ™ en > the client ™ Xd * ^directed to the SN via port 
service incarnation rather than starting a new one from P > so that thc unmodified client implicitly conveys the state 
scratch (e.g., so, for example, the user is not billed twice Q vu ^ new connection. For example, Q might represent 
when SN failure detection and handoff occurs). & c wide-area broadcast channel address to which the service 
Another problem can occur if the ARN fails. By main- node should Ascribe for a particular streaming media feed. 
tainingredundantARNsinasinglecolo,thisproblemcanbe A< Sincc thc unmodified client is not directly protocol- 
resolved using the following technique: compatible with the CBB infrastructure, the proper channel 
Each ARN subscribes to a well-known multicast group subscription is conveyed in the state transfer Q without 
q , or having to involve the client in that dialogue. 

Each ARN aanounces its existence by sending an To avoid havmg to modify a large existmginstaUed based 

~ J b or clients (like Web browsers and streaming media players 

n T^r?i m T B t A nx T , 50 and se ™*)> the client-SN interactions are based on 

Each ARN builds a database of active ^ ARN peers and existi wrvktmSp9< ^ c protocols, e.g., HTTP for the Web, 

times out entries that are not refreshed according to R ^ p fof streami media protocolSj or even olher vendor . 

some configurable period that is greater than the inter- propr ietary protocols 

announcement period; and Qnce a cliem TQqut&{ ^ initiated and intercepted by the 

The ARN with the lowest numbered network address (i.e., 55 ar Nj somc wide-area service must be invoked to pull the 

the ARN with a network address less than all other contem down from lhe CBB ^ the local Mrvice node (if 

ARN's in the database) elects itself as the master ARN thc k not a]rcady preS ent). As described earlier, an 

and begins to advertise reachability to the anycast iterative use of anycasting will fail. Thus, the DNS system 

address block via the 1GP. ^ used t0 map anycast addresses back to the services in a 

Thus, if the master ARN fails, the backup ARNs learn of 60 sca lable and decentralized fashion, 

this condition very quickly (after a single announcement For example> if it is desired t0 support caching of Web 

interval) and a new master ARN is elected. At that point, as ob j ects for ihQ nnteni provider "acme.com", and assuming 

a side effect of the new IGP route advertisements, anycast lhe CBB assigns acme .com the anycast address 10.1.18.27, 

packets are routed to the new master ARN by the colo router. then a pointer t0 me masler server can bc configured into 

Wide-area Architecture 65 DNS with a SRV resource record such as: 

Having described the local-area architecture of the 

devices within a single colo installation, a description of a anycast-io-i-i8-24.http.tcp.cbb.net SRVwww.acmc.com 
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When a service node receives a client connection request 
on TCP port 80 (i.e., the standard HTTP Web port) to its 
anycast address 10.1.18.27, that service node can query the 
DNS SRV record for (anycast-10-l-18-27.http.tcp.cbb.net) 
to learn that the master host for this service is www.acme- 
.com. That knowledge can be locally cached and when the 
requested content is fetched from www.acme.com, it too can 
be locally cached. The next request to the same anycast 
address for the same content can then be satisfied locally. 
Note that the content stored on www.acme.com could have 
links that explicitly reference anycast-10-1-18- 
27.http.tcp.cbb.net or the site could employ a more user- 
friendly name, e.g., www-cbb.acme.com, that is simply a 
CNAME for the anycast name: 

www-cbb.acmc.com CNAME anycast-10-l-18-27.bttp.tcp.cbb.net 

Consider another example where it is desirable to support 
a very large-scale streaming media broadcast also from 
acme.com. In addition to the master Web server, we might 
need to know the location of a "channel allocation service" 
(CAS) that maps streaming media URLs into broadcast 
channel addresses, where a channel address is akin to an 
application-level multicast group as described in No. 
60/115454. In this case, we query the DNS for a SRV 
resource record that points to the CAS to obtain a record that 
might have the form: 

anycast-10-l-18-24.cas.tcp.cbb.net SRV cas.acme.com 

When the ARN receives a client connection request for a 
streaming media URL, it queries cas.acme.com to map that 
URL to the broadcast channel (and locally caches the result 
for future client requests), then subscribes to the channel 
over the CBB now that the channel address is known. 

By storing service bindings in the DNS in this fashion, an 
arbitrary anycast service node can dynamically and auto- 
matically discover the particular services that are bound to 
a particular anycast address. There is no need to configure 
and update service nodes within the infrastructure with this 
knowledge. This greatly simplifies the configuration and 
management of the anycast-based service rendezvous 
mechanism and the content broadcast network at large. 

As described above, the DNS SRV records store map- 
pings from service names to corresponding server addresses. 
However, there may be cases where the ARN needs more 
information than a simple list of servers for a named service. 
The additional information might specify a service node 
selection algorithm, or might specify a service node setup 
procedure. In these cases, the information for the named 
service could be stored in a directory server (like LDAP or 
X.500) or on a network of web servers. When compared 
with DNS, these servers offer greater flexibility and exten- 
sibility in data representation. 
The Redirection Algorithm 

Given the above-described components and system 
architecture, an embodiment of the present invention is 
provided to demonstrate how an end-host invokes a service 
flow or transaction from the service-node infrastructure 
using stateful anycasting. 

A user initiates a content request, e.g., by clicking on a 
Web link represented as a URL. 

The client resolves the DNS name of the resource that the 
URL references. This name ultimately resolves to an 
anycast address that was administered by the authority 
(e.g., www.acme.com is a CNAME for any-10-1- 
18.27.cbb.net). 

The client initiates a normal application connection using 
the anycast address, e.g., a Web page request using 
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HTTP over TCP on port 80 or a streaming media 
request using RTSP over TCP port 554. 

As a side effect of the anycast routing infrastructure 
described above, the client's packets are routed to the 
nearest ARN advertising reachability to the address, 
thereby initiating a connection to that ARN. The ARN 
is prepared to accept requests for each configured 
service, e.g., Web requests on port 80. 

At this point, if the data is available and is of a transac- 
tional nature, then the ARN can either respond with the 
content directly or redirect the requesting client to a 
service node as follows: 

The ARN selects a candidate service node S from its 
associated service cluster. The selection decision 
may be based on load and availability information 
that is maintained from a local monitoring protocol 
as described above. 
The ARN performs an application-specific dialogue 
with S as necessary in preparation for the client C to 
attach to S. For example, in the case of live broadcast 
streaming media, the ARN might indicate the broad- 
cast channel upon which S should tune in to via a 
request to the CBB overlay network. As part of this 
dialogue, S may return information to the ARN that 
is required to properly redirect C to S. Whether this 
information is present and the nature of that infor- 
mation is specific to the particular service requested. 
The ARN responds to the original client request with a 
redirection message that refers the client C to the 
service node S selected above. 
The client C contacts S, in a client-specific fashion, to 
initiate the flow or content transaction associated 
with the service desired. For instance, the client may 
connect to S using the streaming media control 
protocol RTSP to initiate a live transmission of 
streaming media over RTP. 
Active Session Failover 

One disadvantage of the stateful anycasting redirection 
scheme described above is that if the selected service node 
fails for some reason, all clients fed by that node will 
experience disrupted service. If the client is invoking a 
sustained service like a streaming-media feed, the video 
would otherwise halt and the client would be forced to retry. 
In an alternative embodiment, the client may be modified to 
detect the service node failure and re-invoke the redirection 
process before the user notices any degradation in service, a 
process herein called "active session failover". 

FIG. 6 shows a portion of a data network 800 constructed 
in accordance with the present invention. The data network 
800 shows network transactions that demonstrate how active 
session failover operates to deliver content to a client 
without interruption. 

Initially, a client 802 sends a service request 820 to the 
anycast address A, which is routed to ARN 804. The service 
request 820 requests content originating from a CBB 803. 
The ARN 804 decodes the request to determine the appli- 
cation specific redirection message to be sent to the client 
802. The redirection message 822 transmitted by the ARN 
804, redirects the client 802 to service node 806. The client 
802 then transmits a request 824 to obtain the content (e.g., 
a streaming media feed) via an application -specific protocol 
(e.g., RTSP) that causes node 806 to request the streaming- 
media channel across the wide-area by sending a channel 
subscription message 826 to service node 808 using the 
channel description information in the client request (for 
example, see No. 60/115454). The result is that content 
flows from service node 808 to the client as shown by path 
828. 
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Now, it will be assumed that service node 806 fails. The 
client 802 notices a disruption in service and reacts by 
re- invoking the state ful anycast procedure described in the 
previous section: a service request 830 is sent to the anycast 
address A and received by the ARN 804, which responds 
with a redirection message 832, directing the client to a new 
service node 810. The client can now request the new service 
feed from the service node 810, as shown at 834. The service 
node 810 sends a subscription message to the node 808 as 
shown at 836, and the content again flows to the client as 
shown at 838. Assuming the client utilizes adequate buff- 
ering before presenting the streaming-media signal to the 
user (as is common practice to counteract network delay 
variations), this entire process can proceed without any 
disruption in service. When the client attaches, it can send 
packet retransmission requests to service node 810 to posi- 
tion the stream appropriately and retransmit only those 
packets that were lost during the session failover process. 

It might be possible that the client incorrectly infers the 
failure of service node 806, because for example, of a 
momentary network outage. In this case, the client can 
simply ignore the redirection message 832 and continue to 
receive service from service node 806. 
Wide-area Overflow 

One potential problem with the service rendezvous 
mechanism described above is that a given service node 
installation may run out of capacity because too many 
clients are routed to that installation. This may be solved in 
an embodiment where the redirection system is capable of 
redirecting client service requests across the wide area in 
cases of overload. For example, if all of the local service 
nodes are running at capacity, the redirector can choose a 
non-local service node and redirect the client accordingly. 
This redirection decision can in turn be influenced by 
network and server health measurements. In this approach, 
the redirector sends period "probe" messages to the candi- 
date servers to measure the network path latency. Since the 
redirector is typically near the requesting client, these 
redirector-to -server measurements represent an accurate 
estimate of the corresponding network path between the 
client and the candidate server. 

In this embodiment, there are three steps to performing 
wide-area redirection: 

ARNs discover candidate service nodes. 

ARNs measure network path characteristics between each 
service node and itself. 

ARNs query service nodes for their health. 

Given information obtained from the above steps, ARNs 
can choose the service node that is likely to provide the best 
quality of service to any requesting client. To do so, each 
ARN maintains an information database containing load 
information about some number of eligible service nodes. 
The ARN consults its information database to determine the 
most available service node for each client request. To 
maintain its load information, an ARN can actively probe 
network paths and service nodes. Alternatively, service 
nodes can monitor network load and internal load, and report 
load information to their respective ARNs. 

To effect local-area load balancing, each ARN is config- 
ured with the IP addresses of some number of nearby service 
nodes. The ARN maintains load information for these ser- 
vice nodes. However, this local-area approach suffers when 
load is geographically concentrated, since the ARN may 
have fully loaded all of its nearby service nodes, and thereby 
be forced to deny additional service requests from its clients. 
This can occur even though some number of service nodes 
just beyond the local area are underutilized. 
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Wide-area load balancing in accordance with the present 
invention overcomes the above described problem. In wide- 
area load balancing, each ARN is configured with the IP 
addresses of all service nodes in the network and maintains 
an information database containing load information for all 
service nodes. Alternatively, the ARNs may exchange load 
information using a flooding algorithm. 

Another embodiment of the present invention employs a 
scheme called variable-area load balancing. With this 
scheme, each ARN maintains the information database for 
some number of eligible service nodes; and the number of 
eligible service nodes increases with local load. That is, as 
nearby service nodes approach their capacity, the ARN adds 
to its information database load information about some 
number of service nodes just beyond the current scope of the 
ARN. The following provides two different methods that 
may be used to discover incrementally distant service nodes. 

In a first method, the ARN is provisioned with the IP 
addresses of some number of adjacent service nodes. To 
identify incrementally distant service nodes, the ARN sim- 
ply queries these service nodes for a list of neighboring 
service nodes since the service nodes are presumed to form 
a virtual overlay network. This approach may be referred to 
as "overlay network crawling." 

In a second technique, each ARN and each service node 
is assigned a multi-part name from a hierarchical name 
space. Given the names of two service nodes, an ARN can 
determine which is nearest using a longest pattern match. 
For example, an ARN named: 

arn.sanjose.california.pacificcoast.usa.northamerica can 
determine that it is closer to the service node named; 
sn.seattle. Washington. pacificcoast.usa.northamerica than 

it is to the service node named; 
sn.orlando.florida.atlanticcoast.usa.northarnerica 
using a right-to- left longest pattern match. Each ARN can 
retrieve a directory of all service node names and their 
corresponding IP addresses. The directory may^be imple- 
mented using DNS or an analogous distributed directory 
technology. This variable-area load balancing scheme 
40 handles geographically concentrated load by redirecting 
clients to incrementally distant service nodes. The scheme 
addresses scalability concerns by minimizing the number of 
ARN-to-service-node relationships. That is, an ARN only 
monitors the number of service nodes required to serve its 
near-term client load. Moreover, the rate at which ARNs 
probe candidate service nodes is adjusted in inverse propor- 
tion to the distance, since in general the number of nodes at 
a distance that is N hops from a given node grows with N. 

FIG. 7 shows a portion of a data network 900 constructed 
in accordance with the present invention. The data network 
900 includes three connected local networks 902, 904 and 
906. As described in one embodiment of the present 
invention, the network 900 is configured to provide wide 
area overflow. 

The local network 902 includes ARN 908 (redirector) that 
has an associated information database (DB) 910. Also 
included in the local network 902 are service nodes 912 and 
914. The service nodes are shown providing information 
content 928 to clients (C) 916, 918, 920, 922, 924, and 926. 

The networks 904 and 906 include ARNs 930, 932, 
information databases 934, 936 and service nodes 938, 940, 
942 and 944, respectively. These service nodes are providing 
the information content 928 to a number of other clients. 

The ARN 908 monitors network loading characteristics of 
its local service nodes 912 and 914. This loading informa- 
tion is stored in the DB 910. The ARN may also monitor 
loading characteristics of other service nodes. In one 
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embodiment, the ARNs exchange loading information 
which each other. For example, the loading characteristics of 
the service nodes 938 and 940 are monitored by ARN 930 
and stored in DB 934. The ARN 930 may exchange this 
loading information with the ARN 908 as shown at 954. In 
another embodiment, the ARNs may actively probe other 
service nodes to determine their loading characteristics. 
These characteristics can then be stored for future use. For 
example, the ARN 908 probes service node 944 as shown at 
956 and also probes service node 942, as shown at 958. 
Therefore, there are several ways in which the ARN can 
determine loading characteristics of service nodes located in 
both the local network and over the wide area. 

At some point in time, client 950 attempts to receive the 
information content 928. The client 950 sends an anycast 
request 952 into network 902 where the request 952 is 
received by the ARN 908. The ARN 908 may redirect the 
client 950 to one of the local service nodes (912, 914), 
however, the DB 910 associated with ARN 908 shows that 
the local service nodes may not be able to provide the 
requested services to client 950. The ARN 908 is able to 
used the information DB 910 to determine which service 
node would be most appropriate to handle the request from 
client 950. The selected service node is not limited to those 
in the local network with the ARN 908. Any service node 
over the wide area may be selected. 

The ARN 908 determines that service node 942 should 
service the request from client 950. The ARN 908 sends a 
redirection message 960 to the client 950, and thereby 
redirects the client to the service node 942. The client 950 
sends the request to the service node 942 using a transport 
layer protocol like TCP, as shown at 962. The service node 
942 responds by providing the client with the requested 
information content as shown at 964. 

Therefore, using the information database and the ability 
to probe service nodes to obtain loading characteristics, the 
referral nodes are able to effectuate wide area loading 
balancing in accordance with the present invention. 
Technical Extensions 

This section describes additional embodiments to the to 
the invention that comprise technical extensions to the 
embodiments described above. 
Last-hop Multicast 

The use of IP Multicast could be exploited locally as a 
forwarding optimization in the "last-hop" delivery of broad- 
cast content. Thus, it is possible for a client to issue an 
anycast request, and as a result, be redirected to join a 
multicast group. 

FIG. 8 illustrates an embodiment of the present invention 
that uses IP Multicast. A content provider 600 provides three 
service nodes SN0-SN3 for providing information content 
602 via an application level multicast tree 604. A client 606 
request a service feed, as described above, that is received by 
the ARN, as shown at path 608. The ARN redirects the 
request to the service node SN0 to initiate a data transfer, as 
shown at path 610. Rather than initiate a separate data 
channel for each client, however, the service node instructs 
the client (via the control connection) to subscribe to a 
particular multicast group 612 (say group G) to receive the 
information content. The client then joins the multicast 
group and the service node SN0 transmits the information 
content to the group in the local environment. 

As shown, the multicast traffic is replicated only at fan out 
points in the distribution path from the service node SN0 to 
all clients receiving the flow. Simultaneously, the service 
node SN0 would contact an upstream service node SN1 to 
receive the information content over a unicast connection. In 
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this fashion, content is broadcast to all interested receivers 
without having to enable multicast throughout the entire 
network infrastructure. 
Sender Attachment 

The system described thus far has relied on anycast 
routing to route client requests to the nearest service nodes. 
Similarly, anycast could be used to bridge the server at the 
originating site of the content to. the closest service entry 
point. If content servers are explicitly configured into a 
broadcasting infrastructure, the system described herein 
could be adapted for registering and connecting service 
installations to the broadcasting infrastructure. 

FIG. 9 shows an embodiment 700 of the present invention 
adapted for registering and connecting service installations 
to the broadcasting infrastructure. A service node SN0 
wishes to inject a new broadcast channel from a nearby 
server S into a content broadcast network 704. The service 
node SN0 sends a service query 706 using an anycast 
address. The service query requests the identity of a service 
node within the broadcast network 704 most available to 
serve as the endpoiot for a new IP tunnel from SN0. The 
service query carries an anycast address, and is routed to the 
nearest ARN, in this case, Al. Al selects the most available 
service node, and may also update a channel database within 
broadcast network 704 indicating that the new channel is 
available through SN0. In this case, Al selects SNI and 
sends a response 708 to SN0. The response instructs SN0 to 
establish a new IP tunneling circuit 702 to service node SNI. 

This sender attachment system allows an overlay broad- 
cast network to be dynamically extended to reach additional 
servers. This sender attachment system, when used with the 
client- attachment systems described previously, provides a 
comprehensive architecture for dynamically mapping client- 
server traffic onto a series of one or more tunneling circuits, 
with one tunneling endpoint nearest the client, and one 
tunneling endpoint nearest the server. The mapping is per- 
formed in a way that is transparent to the client and server 
applications, and is transparent to their respective access 
routers. 

Multiple Masters 

In various embodiments described herein, only the master 
AS advertises the anycast address block via the interdomain 
routing protocol. There are two extensions to this scheme. 
First, the system can be extended to allow multiple master 
AS's to coexist by partitioning the anycast address space 
among them. That is, multiple instances of the system 
described herein would be fully functional and non- 
interfering as long as they use distinct address spaces for 
their anycast blocks. Second, the system can be extended to 
allow multiple master AS's to advertise the same or over- 
lapping address blocks. In this case, the minimum-distance 
anycast routing would operate at both the intradomain and 
interdomain routing levels. For example, there could be a 
master AS in the North America and a master AS in Europe 
advertising the same anycast block externally, e.g. via BGP. 
Then, a packet sent from an arbitrary client would be sent to 
whichever master AS is closest, and once inside that AS, the 
packet is routed to the nearest service node therein or 
redirected across the wide-area if necessary. 

The present invention provides a comprehensive redirec- 
tion system for content distribution based on a virtual 
overlay broadcast network. It will be apparent to those with 
skill in the art that the above methods and embodiments can 
be modified or combined without deviating from the scope 
of the present invention. Accordingly, the disclosures and 
descriptions herein are intended to be illustrative, but not 
limiting, of the scope of the invention which is set forth in 
the following claims. 
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What is claimed is: 6. A method of operating a packet-switched network 

1. A packet-switched network including addressable rout- including addressable routers for routing packet traffic, 

ers for routing packet traffic, wherein a packet of data is wherein a packet of data is routed from a source node to a 

routed from a source node to a destination node based on destination node based on address fields of the packet, and 

address fields of the packet, the packet-switched network s wherein the packet-switched network includes a redirector 

comprising* coupled to at least one of the addressable routers and at least 

. 11.1/? r . one service node, the method comprising: 

at least one service node coupled to at least first one of the , . . ' , , , , , , . 

,, 4 j l * i * * t j * advertising, to an addressable router coupled to the 

addressable routers and having logic to propagate data f , ...... * j *■ *• 

j t i' ^ j redirector, reachability to an anycast destination 

packets between a client and a plurality of nodes ,n an address frQm ^ redireclor> wherein a packet sem to the 

anycast group, and anycast destination address can be routed to a plurality 

at least one redirector coupled to at least a second one of 0 f destination nodes; 

the addressable routers, the at least one redirector accepting a service request from a client at the redirector, 

comprising: wherein the service request is an anycast message to the 

A) logic for advertising, to the at least a second one of anycast destination address; and 

the addressable routers, reachability to an anycast 15 generating a redirection message directed to the client for 

destination address associated with the plurality of redirecting the service request to the at least one service 

nodes in the anycast group, wherein a packet sent to node. 

the anycast destination address can be routed to a 7. The method of claim 6 wherein the at least one service 

plurality of destination nodes; node comprises a plurality of service nodes, and the step of 

B) logic for accepting a service request from the client, 20 generating comprises steps of: 

wherein the service request is an anycast message to determining a selected service node from the plurality of 

the anycast destination address; and service nodes for handling the service request; and 

C) logic for generating a redirection message directed generating a redirection message directed to the client 
to the client for redirecting the service request to the for redirecting the service request to the selected 
at least one service node. 25 service node. 

2. The packet-switched network of claim 1 wherein the at 8 - The method of claim 7 wherein the step of determining 
least one service node comprises a plurality of service nodes comprises steps of: 

and the at least one redirector comprises: monitoring a network traffic condition at the plurality of 

logic to determine a selected service node from the 3Q service nodes; and 

plurality of service nodes for handling the service selecting the selected service node from the plurality of 

request; and service nodes based on the network traffic condition. 

logic for generating a redirection message directed to the 9 * ^ method °. f claim 7 wherein me stc P of determining 

client for redirecting the service request to the selected comprises steps oi. 

service node 35 monitoring a server condition at the plurality of service 

3. The packet-switched network of claim 2 wherein the nodes; and 

logic to determine the selected service node from the plu- selecting the selected service node from the plurality of 

rality of service nodes comprises: service nodes based on the server condition. 

logic to monitor a network traffic condition at the plurality , 10 ;. ^ , me * od of ^ laim 6 ' wherein a fi f P or ^ on of j he 

of service nodes; and 40 P luraut y of nodes m me anycast group are located at a first 

, , „ . , geographic location, and wherein a second portion of the 

logic to select the selected service node from the plurality ^ of nodes fa ^ m ^ lQcMd a , t 

of service nodes based on the network traffic condition. second geographic location md , he slep of generating 

4. The packet-switched network of claim 2 wherein the nn nf . 

1-111 i r 11 comprising steps or: 

logic to determine the selected service node from the plu- , . . . , 4 4 , 

r. c . , a* determining whether the client sending the anycast ser- 

rality or service nodes comprises: 43 . , • i * ^ . r j ■ ,i_ 

' r vice request is closer to the first portion of nodes in the 

logic to monitor a server condition at the plurality of anycast group or the second portion of nodes in the 

service nodes; and anycast group; and 

logic to select the selected service node from the plurality generating the redirection message directed to the client 

of service nodes based on the server condition. 5(J f or redirecting the service request to a first service node 

5. The packet-switched network of claim 1, wherein a first if the client is closer to the first portion of nodes in the 
portion of the plurality of nodes in the anycast group are anycast group and for redirecting the service request to 
located at a first geographic location, and wherein a second a second service node if the client is closer to the 
portion of the plurality of nodes in the anycast group are second portion of nodes in the anycast group, 
located at a second geographic location, the redirector 5S u, a method of operating a redirector in a packet- 
further comprising: switched network including addressable routers for routing 

logic for determining whether the client sending the packet traffic, wherein a packet of data is routed from a 

anycast service request is closer to the first portion of source node to a destination node based on address fields of 

nodes in the anycast group or the second portion of the packet, the method comprising: 

nodes in the anycast group; and 60 advertising, to an addressable router coupled to the 

logic for generating the redirection message directed to redirector, reachability to an anycast destination 

the client for redirecting the service request to a first address from the redirector, wherein a packet sent to the 

service node if the client is closer to the first portion of anycast destination address can be routed to a plurality 

nodes in the anycast group and for redirecting the of destination nodes; 

service request to a second service node if the client is 65 accepting a service request from a client, wherein the 

closer to the second portion of nodes in the anycast service request is an anycast message to the anycast 

group. destination address; 
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determining a selected server for handling the service 
request, the selected server being one of a plurality of 
servers that can handle the service request; and 

generating a redirection message directed to the client for 
redirecting the service request to the selected server. 

12. The method of claim 11 further comprising a step of 
monitoring a traffic condition of the plurality of servers. 

13. The method of claim 12 wherein the step of deter- 
mining comprises a step of determining the selected server 
from the plurality of servers based on the traffic condition. 

14. The method of claim 11 further comprising a step of 
monitoring a server condition of the plurality of servers. 

15. The method of claim 14 wherein the step of deter- 
mining comprises a step of determining the selected server 
from the plurality of servers based on the server condition. 

16. The method of claim 11 further comprising a step of 
handling the service request at the redirector. 

17. The method of claim U wherein the step of generating 
comprises a step of generating a redirection message 
directed to the client for redirecting the client to subscribe to 
a multicast group at the selected server. 

18. In a packet-switched network including addressable 
routers for routing packet traffic, wherein a packet of data is 
routed from a source node to a destination node based on 
address fields of the packet, an improvement comprising: 

a redirector coupled to at least one of the addressable 

routers, the redirector including: 

logic for advertising, to the at least one of the addres- 
sable routers, reachability for an anycast destination 
address, wherein a packet sent to the anycast desti- 
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nation address can be routed to a plurality of desti- 
nation nodes; 

logic for accepting a service request from a client, 
wherein the service request is an anycast message to 
the anycast destination address; 

logic for determining a selected server for handling the 
service request, the selected server being one of a 
plurality of servers that can handle the service 
request; and 

logic for generating a redirection message directed to 
the client for redirecting the service request to the 
selected server. 

19. The redirector of claim 18 wherein the logic for 
determining comprises: 

logic for monitoring a network traffic condition of the 

plurality of servers; and 
logic for selecting the selected server from the plurality of 

servers based on the network traffic condition. 

20. The redirector of claim 18 wherein the logic for 
determining comprises: 

logic for monitoring a server condition of the plurality of 
servers; and 

logic for selecting the selected server from the plurality of 
servers based on the server condition. 

21. The packet-switched network of claim 18, wherein the 
selected server is a multicasting server. 

22. The packet-switched network of claim 18, wherein the 
redirector is the selected server. 
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